Overview

An Exploration into the Intersections of Salaries and Social Identity

In this project, I will use the Salary by Job Title and Country dataset available in Kaggle.
I have cleaned the data to group the same degrees and similar job titles. I have also excluded some observations with unique job titles in order to focus on the most popular and common jobs.

Research questions:

Motivations: In 2021: about two-thirds of employees in the STEM workforce were men and about one-third were women (NCSES). Being a woman going into the STEM field, I thought it would be interesting to see how the salaries compared. When I found this dataset with many other variables, I decided to look at other social identities as well.

Definition of Social Identity: Social identity is the part of self-concept that is derived from memberships in social groups or categories (APA). Social identities explored in this project:

Data

Column

First 500 Observations

Column

Variables

There are 6647 observations and 8 variables in this dataset.

  • Age: the age of the employee
  • Gender: the gender of the employee
  • Education Level: the highest degree the employee has earned
  • Job Title: the title of the job/job category the employee possesses
  • Years of Experience: the number of years the employee has worked in that field
  • Salary: the yearly salary of the employee (US Dollars)
  • Country: the country in which they are employed
  • Race: the race of the employee
A glimpse of the data:
Rows: 6,647
Columns: 8
$ Age                   <dbl> 32, 28, 36, 29, 42, 31, 26, 38, 29, 48, 35, 40, …
$ Gender                <chr> "Male", "Female", "Female", "Male", "Female", "M…
$ `Education Level`     <chr> "Bachelor's Degree", "Master's Degree", "Bachelo…
$ `Job Title`           <chr> "Software Engineer", "Data Analyst", "Sales Repr…
$ `Years of Experience` <dbl> 5, 3, 7, 2, 12, 4, 1, 10, 3, 18, 6, 14, 2, 16, 7…
$ Salary                <dbl> 90000, 65000, 60000, 55000, 120000, 80000, 45000…
$ Country               <chr> "UK", "USA", "USA", "USA", "USA", "China", "Chin…
$ Race                  <chr> "White", "Hispanic", "Hispanic", "Hispanic", "As…

Age & Experience

Column

Age vs Years of Experience

Age vs Salary

Column

Analysis

It makes sense that as age increases, so does years of experience. Which we can see from the scatter plot. There might be some instances of someone starting a career younger than most others or someone starting over in a new career later in life that don’ quite fit the pattern. However, there is a pretty significant positive linear pattern among this dataset.

That corresponds to salary increasing as age increases. The salary will depend on the job because younger people could start in a higher paying job than an older person has. Overall, there is a positive linear pattern among salary and age.

Salary by Country

Column

Overall Salary Histogram

Salary per Country

Map

Column

Analysis

Salary Histogram: The distribution of the salaries in this dataset is multimodal. There are many peaks and dips throughout the histogram. The summary statistics for the salary variable are:

Min.  1st Qu.  Median   Mean   3rd Qu.  Max.
25000   70000   115000  115454  160000  250000

Salary per Country: The countries all have similar salaries.
The median salaries for each country are:

  • United States: $110,000
  • Australia: $115,000
  • UK: $115,000
  • China: $117,460
  • Canada: $120,000

All of the countries have the same minimum salary of $25,000. However, they have different maximums. Canada has the highest and Australia has the lowest.

Social Identities

Column

People per Race

Salary by Race

Salary by Gender

People per Education Level

Salary by Education Level

Column

Analysis

Race: A majority of the people in this dataset are white, followed closely behind by asian, then black, mixed, and hispanic. Mixed race has the highest median while hispanic has the lowest. All races have the same minimum but asain and black have the highest maximums.
The median salaries for each race are:

  • Hispanic: $104,830
  • Asian: $115,000
  • White: $115,000
  • Black: $118,000
  • Mixed: $120,000

Gender: This boxplot shows that males have higher salaries than females. The median salaries for each race are:

  • Female: $105,000
  • Male: $120,000

Education Level: Most people in this dataset have a Bachelor’s Degree, then Master’s, then PhD. Overall people with Bachelor’s Degrees have lower salaries, Master’s in the middle, and PhDs produce the highest salaries.
The median salaries for each education level are:

  • Bachelor’s Degree: $80,000
  • Master’s Degree: $120,000
  • PhD: $170,000

Most Common Jobs

Column

Salaries for Common Jobs

Education Level for Common Jobs

Column

Analysis

These were the five most common jobs in this dataset. I created a separate data frame to examine these five jobs on their own. A data analyst and software engineer tend to make the most, followed by a marketing employee, someone who works in human resources, and finally a developer.
The median salaries for these jobs are:

  • Software Engineer: $154,636
  • Data Analyst: $150,000
  • Marketing: $95,000
  • Human Resources: $92,000
  • Developer: $70,000

Looking at the education levels among these five jobs we can see that there is a significant amount of data analysts with PhDs. There are no PhDs among developers and most of them have Bachelor’s degrees. Human resources employees have a lot of Master’s degrees. Marketing employees and software engineers have more Bachelor’s degrees, but still have a good amount of Master’s and PhDs.

Random Jobs

Column

Salaries for Random Jobs

Education Level for Random Jobs

Column

Analysis

I chose five jobs at random that I thought were well known, popular, and would show interesting results. I then created a separate data frame to examine these five jobs on their own. A researcher tends to make the most, followed by a financial advisor which also has the most variance among the jobs. IT support is next, then an accountant, and finally a sales representative.
The median salaries for these jobs are:

  • Researcher: $160,000
  • Financial Advisor: $120,000
  • IT Support: $110,000
  • Accountant: $55,000
  • Sales Representative: $30,000

Looking at the education levels among these five jobs we can see that all of the accountants have Bachelor’s degrees. Most of the financial adivsors have a Bachelor’s with some Master’s and one PhD. IT support contains mostly Bachelor’s with some Master’s. All of the researchers have PhDs except for one. Surprisingly, a majority of the sales associates have a Master’s degree.

Conclusion

Conclusions: When looking at salary distributions for the social identities in this dataset, I found that salary increases with age which corresponds to years of experience. The salaries were very similar among the countries represented by this dataset. There were also similar distributions among races with the hispanic race having the lowest median and mixed race having the highest. Males have a significantly higher salary than women showed by the median and maximum from the two genders. There is also a difference in salaries between the education levels which makes sense because more education leads to higher salaries. From the 10 specific careers looked at in this project, we can see that researchers, software engineers, and data analysts make the most. Developers, accountants, and sales representatives make the least.

Limitations: Some limitations of this project was that the dataset contained mostly higher paying jobs. This is not an accurate description of overall salaries among certain groups. For example, the median salary in the U.S. is about $40,000-$50,000 so about half of what this dataset shows.

Potential Future Directions: Future directions could include exploring more aspects of social identities like religion or sexual orientation. It would be good to look at more countries, maybe some that aren’t first-world countries.

Audience: This project would be good for college graduated or other young people trying to decide what career field to go into. They can look at these statistics and compare them to their own social identities to see what a potential salary could look like for them. It could also be used by people in the workforce to compare their salary to others’.

About the Author: My name is Lindsey Winslow. I am an undergraduate student at the University of Dayton. I am graduating in May 2024 with a Bachelor of Science in Education with a major in Education & Allied Studies, along with a Bachelor of Arts with a major in Mathematics. I am interested in pursuing full time employment in the corporate data analytics field or in something education related like curriculum development.

You can connect with me on LinkedIn

---
title: "Salaries & Social Identity"
output: 
  flexdashboard::flex_dashboard:
    theme:
      version: 4
      bootswatch: default
      navbar-bg: "green"
    orientation: columns
    vertical_layout: fill
    source_code: embed
---

```{r setup, include=FALSE}
library(flexdashboard)
library(tidyverse)
library(DT)
library(plotly)
salary<- read_csv("~/Desktop/MTH 209/SalaryFinal.csv")
```

Overview
===
**An Exploration into the Intersections of Salaries and Social Identity**

In this project, I will use the Salary by Job Title and Country dataset available in [Kaggle](https://www.kaggle.com/datasets/amirmahdiabbootalebi/salary-by-job-title-and-country/data).  
I have cleaned the data to group the same degrees and similar job titles. I have also excluded some observations with unique job titles in order to focus on the most popular and common jobs.

**Research questions:** 

- Does gender and/or race have any impact on the salary someone receives?
- Do any countries have higher salaries than others? Why might that be?
- Which jobs have higher or lower salaries?

**Motivations:** In 2021: about two-thirds of employees in the STEM workforce were men and about one-third were women [(NCSES).](https://ncses.nsf.gov/pubs/nsf23315/report/the-stem-workforce#:~:text=The%20share%20of%20women%20and,(figure%202%2D3)) Being a woman going into the STEM field, I thought it would be interesting to see how the salaries compared. When I found this dataset with many other variables, I decided to look at other social identities as well.

**Definition of Social Identity:** Social identity is the part of self-concept that is derived from memberships in social groups or categories [(APA).](https://dictionary.apa.org/social-identity)
Social identities explored in this project:

- Age
- Nationality
- Race
- Gender
- Education Level
- Occupation

Data
===

Column {data-width=450}
---

### <b><font size=4><span Style = "color:blue">First 500 Observations</span></font></b>

```{r show_table}
datatable(salary[1:500,], rownames=FALSE, colnames= c("Age", "Gender", "Education Level", "Job Title", "Years of Experience", "Salary", "Country", "Race"), options=list(pageLength=20))
```

Column {data-width=550}
---

### <font size= 4><span Style = "color:blue">Variables</span></font>

There are 6647 observations and 8 variables in this dataset.

- Age: the age of the employee
- Gender: the gender of the employee
- Education Level: the highest degree the employee has earned
- Job Title: the title of the job/job category the employee possesses
- Years of Experience: the number of years the employee has worked in that field
- Salary: the yearly salary of the employee (US Dollars)
- Country: the country in which they are employed
- Race: the race of the employee

A glimpse of the data:
```{r}
glimpse(salary)
```

Age & Experience
===

Column {.tabset data-width=550}
---

### Age vs Years of Experience
```{r}
salary<- salary %>% rename(
  Job_Title= `Job Title`,
  Education_Level=`Education Level`,
  Years_of_Experience=`Years of Experience`)
ggplot(salary, aes(x=Age, y=Years_of_Experience))+
  geom_point(color="#458B74")+
  labs(title="Age vs Years of Experience", y= "Years of Experience")+
  theme(text=element_text(size=20))
```

### Age vs Salary
```{r}
ggplot(salary, aes(x=Age, y=Salary))+
  geom_point(color="#6E8B3D")+
  labs(title="Age vs Salary")+
  theme(text=element_text(size=20))
```

Column {data-width=450}
---

### Analysis

It makes sense that as age increases, so does years of experience. Which we can see from the scatter plot. There might be some instances of someone starting a career younger than most others or someone starting over in a new career later in life that don' quite fit the pattern. However, there is a pretty significant positive linear pattern among this dataset.

That corresponds to salary increasing as age increases. The salary will depend on the job because younger people could start in a higher paying job than an older person has. Overall, there is a positive linear pattern among salary and age.

Salary by Country
===

Column {.tabset data-width=550}
---

### Overall Salary Histogram
```{r}
ggplot(salary, aes(x=Salary))+
  geom_histogram(fill="#00CD66")+
  labs(title="Distribution of Salary")+
  theme(text=element_text(size=20))
```

### Salary per Country
```{r}
ggplot(salary, aes(x=Country, y=Salary))+
  geom_boxplot(fill="#C1FFC1")+
  labs(title="Distribution of Salary by Country", y="Frequency")+
  theme(text=element_text(size=20))
```

### Map
```{r}
map<- map_data("world")
salary_map<- salary %>%
  group_by(Country) %>%
  summarize(medsalary=median(Salary))%>%
  left_join(map, by=c("Country"="region"))
ggplot(salary_map, aes(long, lat, group=group))+
  geom_polygon(aes(fill=medsalary), color="white")+
  scale_fill_viridis_c(option="C")+
  labs(fill="Median Salary per Country")+
  theme_void()+
  theme(legend.position="bottom")+
  theme(text=element_text(size=8))
```


Column {data-width=450}
---

### Analysis

**Salary Histogram:** The distribution of the salaries in this dataset is multimodal. There are many peaks and dips throughout the histogram. The summary statistics for the salary variable are:

    Min.  1st Qu.  Median   Mean   3rd Qu.  Max.
    25000   70000   115000  115454  160000  250000
    
**Salary per Country:** The countries all have similar salaries.  
The median salaries for each country are:

- United States: $110,000
- Australia: $115,000
- UK: $115,000
- China: $117,460
- Canada: $120,000

All of the countries have the same minimum salary of $25,000. However, they have different maximums. Canada has the highest and Australia has the lowest.

Social Identities
===

Column {.tabset data-width=550}
---

### People per Race
```{r}
ggplot(salary, aes(x=Race))+
  geom_bar(fill="#87CEFA")+
  labs(title="Number of People per Race", y="Count")+
  theme(text=element_text(size=20))
```

### Salary by Race
```{r}
ggplot(salary, aes(x=Race, y=Salary))+
  geom_boxplot(fill="#E0FFFF")+
  labs(title="Distribution of Salary by Race")+
  theme(text=element_text(size=20))
```

### Salary by Gender
```{r}
ggplot(salary, aes(x=Gender, y=Salary))+
  geom_boxplot(fill="#BCEE68")+
  labs(title="Distribution of Salary by Gender")+
  theme(text=element_text(size=20))
```

### People per Education Level
```{r}
ggplot(salary, aes(x=Education_Level))+
  geom_bar(fill="#8B668B")+
  labs(title="Number of People per Education Level", y="Count", x="Education Level")+
  theme(text=element_text(size=17))
```

### Salary by Education Level
```{r}
ggplot(salary, aes(x=Education_Level, y=Salary))+
  geom_boxplot(fill="#FFBBFF")+
  labs(title="Distribution of Salary by Education Level", x="Education Level")+
  theme(text=element_text(size=16))
```

Column {data-width=450}
---

### Analysis

**Race:** A majority of the people in this dataset are white, followed closely behind by asian, then black, mixed, and hispanic. Mixed race has the highest median while hispanic has the lowest. All races have the same minimum but asain and black have the highest maximums.  
The median salaries for each race are:

- Hispanic: $104,830
- Asian: $115,000 
- White: $115,000 
- Black: $118,000
- Mixed: $120,000

**Gender:** This boxplot shows that males have higher salaries than females.
The median salaries for each race are:

- Female: $105,000
- Male: $120,000

**Education Level:** Most people in this dataset have a Bachelor's Degree, then Master's, then PhD. Overall people with Bachelor's Degrees have lower salaries, Master's in the middle, and PhDs produce the highest salaries.  
The median salaries for each education level are:

- Bachelor's Degree: $80,000
- Master's Degree: $120,000
- PhD: $170,000

Most Common Jobs
===

Column {.tabset data-width=600}
---

### Salaries for Common Jobs
```{r}
TopFive<- salary %>%
  filter(Job_Title=="Software Engineer" | Job_Title=="Marketing" |
         Job_Title=="Data Analyst" | Job_Title=="Developer" |
         Job_Title=="Human Resources")
ggplot(TopFive, aes(x=Job_Title, y=Salary))+
  geom_boxplot(fill="#00CD66")+
  labs(title="Distribution of Salary by Job Title", subtitle = "Most Common Five", x="Job Title")+
  theme(text=element_text(size=15),
        axis.text.x=element_text(size=10))
```

### Education Level for Common Jobs
```{r}
ggplot(TopFive, aes(x=Job_Title, fill=Education_Level))+
  geom_bar(position="fill")+
  scale_y_continuous(breaks=seq(0,1,by=0.2), labels=scales::percent)+
  labs(title="Distribution of Education by Job", subtitle = "Most Common Five", x="Job Title", y="Percent of People", fill="Education Level")+
  theme(text=element_text(size=15),
        axis.text.x=element_text(size=7))
```

Column {data-width=400}
---

### Analysis

These were the five most common jobs in this dataset. I created a separate data frame to examine these five jobs on their own. A data analyst and software engineer tend to make the most, followed by a marketing employee, someone who works in human resources, and finally a developer.  
The median salaries for these jobs are:

- Software Engineer: $154,636
- Data Analyst: $150,000 
- Marketing: $95,000 
- Human Resources: $92,000
- Developer: $70,000

Looking at the education levels among these five jobs we can see that there is a significant amount of data analysts with PhDs. There are no PhDs among developers and most of them have Bachelor's degrees. Human resources employees have a lot of Master's degrees. Marketing employees and software engineers have more Bachelor's degrees, but still have a good amount of Master's and PhDs.

Random Jobs
===

Column {.tabset data-width=600}
---

### Salaries for Random Jobs
```{r}
RandomFive<- salary %>%
  filter(Job_Title=="Sales Representative" | Job_Title=="Financial Advisor" |
           Job_Title=="Researcher" | Job_Title=="Accountant" |
           Job_Title=="IT Support")
ggplot(RandomFive, aes(x=Job_Title, y=Salary))+
  geom_boxplot(fill="#4EEE94")+
  labs(title="Distribution of Salary by Job Title", subtitle="Random Five", x="Job Title")+
  theme(text=element_text(size=15),
        axis.text.x=element_text(size=9))
```

### Education Level for Random Jobs
```{r}
ggplot(RandomFive, aes(x=Job_Title, fill=Education_Level))+
  geom_bar(position="fill")+
  scale_y_continuous(breaks=seq(0,1,by=0.2), labels=scales::percent)+
  labs(title="Distribution of Education by Job", subtitle = "Random Five", x="Job Title", y="Percent of People", fill="Education Level")+
  theme(text=element_text(size=15),
        axis.text.x=element_text(size=6))
```

Column {data-width=400}
---

### Analysis

I chose five jobs at random that I thought were well known, popular, and would show interesting results. I then created a separate data frame to examine these five jobs on their own. A researcher tends to make the most, followed by a financial advisor which also has the most variance among the jobs. IT support is next, then an accountant, and finally a sales representative.  
The median salaries for these jobs are:

- Researcher: $160,000
- Financial Advisor: $120,000
- IT Support: $110,000
- Accountant: $55,000
- Sales Representative: $30,000

Looking at the education levels among these five jobs we can see that all of the accountants have Bachelor's degrees. Most of the financial adivsors have a Bachelor's with some Master's and one PhD. IT support contains mostly Bachelor's with some Master's. All of the researchers have PhDs except for one. Surprisingly, a majority of the sales associates have a Master's degree.

Conclusion
===

**Conclusions:** When looking at salary distributions for the social identities in this dataset, I found that salary increases with age which corresponds to years of experience. The salaries were very similar among the countries represented by this dataset. There were also similar distributions among races with the hispanic race having the lowest median and mixed race having the highest. Males have a significantly higher salary than women showed by the median and maximum from the two genders. There is also a difference in salaries between the education levels which makes sense because more education leads to higher salaries. From the 10 specific careers looked at in this project, we can see that researchers, software engineers, and data analysts make the most. Developers, accountants, and sales representatives make the least. 

**Limitations:** Some limitations of this project was that the dataset contained mostly higher paying jobs. This is not an accurate description of overall salaries among certain groups. For example, the median salary in the U.S. is about \$40,000-$50,000 so about half of what this dataset shows.

**Potential Future Directions:** Future directions could include exploring more aspects of social identities like religion or sexual orientation. It would be good to look at more countries, maybe some that aren't first-world countries. 

**Audience:** This project would be good for college graduated or other young people trying to decide what career field to go into. They can look at these statistics and compare them to their own social identities to see what a potential salary could look like for them. It could also be used by people in the workforce to compare their salary to others'.

**About the Author:** My name is Lindsey Winslow. I am an undergraduate student at the University of Dayton. I am graduating in May 2024 with a Bachelor of Science in Education with a major in Education & Allied Studies, along with a Bachelor of Arts with a major in Mathematics. I am interested in pursuing full time employment in the corporate data analytics field or in something education related like curriculum development.

You can connect with me on [LinkedIn](https://www.linkedin.com/in/lindsey-winslow-79b537306/)